World Wide Web Information Retrieval Using Web Connectivity Information

نویسندگان

Jiafei Sun

Wen-Chen Hu

Gerry V. Dozier

Dean Hendrix

چکیده

ii PROJECT ABSTRACT Gathering, processing and distributing information from the World Wide Web will be a vital technology for the next century. Web search techniques have played a critical role in the development of information systems. Due to the diverse nature of web documents, traditional search techniques must be improved. Hyperlink structure based methods have proved to be powerful ways of exploring the relationships between web documents. In this project, a prototype web search engine was developed to exploit the link structure of web documents, based on the use of the Companion algorithm. The prototype consists of a web spider, local database, and search software. The system was written using the Java programming language. Our spider crawls and downloads web pages using Lynx, then saves the hyperlinks into an Oracle database. JDBC is used to implement the database processing. Search software makes a vicinity graph for the query URL and returns the most related pages after calculating the hub and authority weights. Finally, HTML web pages provide user interfaces and communicate with CGI using the Perl language. iii ACKNOWLEDGMENTS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools

Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...

متن کامل

Behavioral Considerations in Developing Web Information Systems: User-centered Design Agenda

The current paper explores designing a web information retrieval system regarding the searching behavior of users in real and everyday life. Designing an information system that is closely linked to human behavior is equally important for providers and the end users. From an Information Science point of view, four approaches in designing information retrieval systems were identified as system-...

متن کامل

Multilingual Information Retrieval in World Wide Web

The article addresses: (1). The design of an information retrieval (IR), as the Multilingual Information Retrieval Tool Hierarchy (MIRTH), which with virtual corpora on the World Wide Web, also known as Web or WWW. It is motivated by the desire to create a search engine to retrieve information by accessing a virtual. (2). The implementation of a general model of multilingual retrieval for the W...

متن کامل

A Comparison of Techniques to Find Mirrored Hosts on the WWW

We compare several algorithms for identifying mirrored hosts on the World Wide Web. The algorithms operate on the basis of URL strings and linkage data: the type of information easily available from web proxies and crawlers. Identification of mirrored hosts can improve web-based information retrieval in several ways: First, by identifying mirrored hosts, search engines can avoid storing and ret...

متن کامل

Retrieval of Web Documents Using a Fuzzy Hierarchical Clustering

The World Wide Web has huge amount of information that is retrieved using information retrieval tool like Search Engine. Page repository of Search Engine contains the web documents downloaded by the crawler. This repository contains variety of web documents from different domains. In this paper, a technique called “Retrieval of Web documents using a fuzzy hierarchical clustering” is being propo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

World Wide Web Information Retrieval Using Web Connectivity Information

نویسندگان

چکیده

منابع مشابه

Assessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools

Behavioral Considerations in Developing Web Information Systems: User-centered Design Agenda

Multilingual Information Retrieval in World Wide Web

A Comparison of Techniques to Find Mirrored Hosts on the WWW

Retrieval of Web Documents Using a Fuzzy Hierarchical Clustering

عنوان ژورنال:

اشتراک گذاری